20 research outputs found
Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces
The ability of generative models to produce highly realistic synthetic face
images has raised security and ethical concerns. As a first line of defense
against such fake faces, deep learning based forensic classifiers have been
developed. While these forensic models can detect whether a face image is
synthetic or real with high accuracy, they are also vulnerable to adversarial
attacks. Although such attacks can be highly successful in evading detection by
forensic classifiers, they introduce visible noise patterns that are detectable
through careful human scrutiny. Additionally, these attacks assume access to
the target model(s) which may not always be true. Attempts have been made to
directly perturb the latent space of GANs to produce adversarial fake faces
that can circumvent forensic classifiers. In this work, we go one step further
and show that it is possible to successfully generate adversarial fake faces
with a specified set of attributes (e.g., hair color, eye size, race, gender,
etc.). To achieve this goal, we leverage the state-of-the-art generative model
StyleGAN with disentangled representations, which enables a range of
modifications without leaving the manifold of natural images. We propose a
framework to search for adversarial latent codes within the feature space of
StyleGAN, where the search can be guided either by a text prompt or a reference
image. We also propose a meta-learning based optimization strategy to achieve
transferable performance on unknown target models. Extensive experiments
demonstrate that the proposed approach can produce semantically manipulated
adversarial fake faces, which are true to the specified attribute set and can
successfully fool forensic face classifiers, while remaining undetectable by
humans. Code: https://github.com/koushiksrivats/face_attribute_attack.Comment: Accepted in CVPR 2023. Project page:
https://koushiksrivats.github.io/face_attribute_attack
CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search
The success of deep learning based face recognition systems has given rise to
serious privacy concerns due to their ability to enable unauthorized tracking
of users in the digital world. Existing methods for enhancing privacy fail to
generate naturalistic images that can protect facial privacy without
compromising user experience. We propose a novel two-step approach for facial
privacy protection that relies on finding adversarial latent codes in the
low-dimensional manifold of a pretrained generative model. The first step
inverts the given face image into the latent space and finetunes the generative
model to achieve an accurate reconstruction of the given image from its latent
code. This step produces a good initialization, aiding the generation of
high-quality faces that resemble the given identity. Subsequently, user-defined
makeup text prompts and identity-preserving regularization are used to guide
the search for adversarial codes in the latent space. Extensive experiments
demonstrate that faces generated by our approach have stronger black-box
transferability with an absolute gain of 12.06% over the state-of-the-art
facial privacy protection approach under the face verification task. Finally,
we demonstrate the effectiveness of the proposed approach for commercial face
recognition systems. Our code is available at
https://github.com/fahadshamshad/Clip2Protect.Comment: Accepted in CVPR 2023. Project page:
https://fahadshamshad.github.io/Clip2Protect
Sparks of Large Audio Models: A Survey and Outlook
This survey paper provides a comprehensive overview of the recent
advancements and challenges in applying large language models to the field of
audio signal processing. Audio processing, with its diverse signal
representations and a wide range of sources--from human voices to musical
instruments and environmental sounds--poses challenges distinct from those
found in traditional Natural Language Processing scenarios. Nevertheless,
\textit{Large Audio Models}, epitomized by transformer-based architectures,
have shown marked efficacy in this sphere. By leveraging massive amount of
data, these models have demonstrated prowess in a variety of audio tasks,
spanning from Automatic Speech Recognition and Text-To-Speech to Music
Generation, among others. Notably, recently these Foundational Audio Models,
like SeamlessM4T, have started showing abilities to act as universal
translators, supporting multiple speech tasks for up to 100 languages without
any reliance on separate task-specific systems. This paper presents an in-depth
analysis of state-of-the-art methodologies regarding \textit{Foundational Large
Audio Models}, their performance benchmarks, and their applicability to
real-world scenarios. We also highlight current limitations and provide
insights into potential future research directions in the realm of
\textit{Large Audio Models} with the intent to spark further discussion,
thereby fostering innovation in the next generation of audio-processing
systems. Furthermore, to cope with the rapid development in this area, we will
consistently update the relevant repository with relevant recent articles and
their open-source implementations at
https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL:
https://github.com/EmulationAI/awesome-large-audio-model